Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
نویسندگان
چکیده
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efficient use of sampled data. In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(λ) chooses a very specific way of averaging these estimators based on the fixed parameter λ, which may not lead to optimal convergence rates in all settings. In this paper, we derive an automated Bayesian approach to setting λ that we call temporal difference Bayesian model averaging (TDBMA). Empirically, TD-BMA always performs as well and often much better than the best fixed λ for TD(λ) (even when performance for different values of λ varies across problems) without requiring that λ or any analogous parameter be manually tuned.
منابع مشابه
Modeling Factors Affecting Tax Evasion in Iran's Economy Based on the Bayesian averaging approach
This study seeks to model tax evasion and identify how effective factors affect tax evasion in the Iranian economy. Recent models show the failure of traditional models; Models do not have enough ability to model hidden variables such as tax evasion. The present study considers this failure in identifying explanatory variables and experimental model design. To achieve this, the Bayesian averagi...
متن کاملPredicting waste generation using Bayesian model averaging
A prognosis model has been developed for solid waste generation from households in Hoi An City, a famous tourist city in Viet Nam. Waste sampling, followed by a questionnaire survey, was carried out to gather data. The Bayesian model average method was used to identify factors significantly associated with waste generation. Multivariate linear regression analysis was then applied to evaluate th...
متن کاملFactors Affecting Energy Intensity in Provinces of Iran: Bayesian Averaging Approach
The identification of the most important factors affecting energy intensity with the aim of controlling and managing energy consumption is an important topic. Findings of different empirical studies on the factors affecting energy intensity are inconsistent and this raises uncertainty about the employed models. One of the techniques that conform to these uncertainty conditions of the model is t...
متن کاملFactors Affecting Energy Intensity in Provinces of Iran: Bayesian Averaging Approach
The identification of the most important factors affecting energy intensity with the aim of controlling and managing energy consumption is an important topic. Findings of different empirical studies on the factors affecting energy intensity are inconsistent and this raises uncertainty about the employed models. One of the techniques that conform to these uncertainty conditions of the model is t...
متن کاملAssessment of Neonate's Congenital Hypothyroidism Pattern Using Poisson Spatio-temporal Model in Disease Mapping under the Bayesian Paradigm during 2011-18 in Guilan, Iran
Background: Congenital Hypothyroidism (CH) is one of the reasons for mental retardation and defective growth in neonates. It can be treated if it is diagnosed early. The congenital hypothyroidism can be diagnosed using newborn screening in the first days after birth. Disease mapping helps to identify high-risk areas of the disease. This study aimed to evaluate the pattern of CH using the Poisso...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010